HMM-based speech recognition using state-dependent, linear transforms on Mel-warped DFT features

نویسندگان

  • Rathinavelu Chengalvarayan
  • Li Deng
چکیده

cessing techniques, there are no theoretical reasons why the In this paper, we investigate the interactions of front-end feature extraction and back-end classification techniques in HMM based speech recognizer. This work concentrates on finding the optimal linear transformation of Mel-warped short-time DFT information according to the ininiinuni classification ei-ror criterion. These transformations, along with the HMM parameters, are automatically trained using the gradient descent method to minimize a measure of overall empirical error count. The discriminatively derived state-dependent transformations on the DFT data are then combined with their first time derivatives to produce a basic feature set. Experimental results show that Mel-warped DFT features, subject to appropriate transformation in a state-dependent manner, are more effective than the Mel-frequency cepstral coefficients that have dominated current speech recognition technology. The best error rate reduction of 9% is obtained using the new model, tested on a TIMIT phone classification task, relative to conventional HMM.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features

In the study reported in this paper, we investigate interactions of front-end feature extraction and back-end classification techniques in hidden Markov model-based (HMMbased) speech recognition. The proposed model focuses on dimensionality reduction of the mel-warped discrete fourier transform (DFT) feature space subject to maximal preservation of speech classification information, and aims at...

متن کامل

Frequency warping and robust speaker verification: a comparison of alternative mel-scale representations

Accuracy of speaker verification is high under controlled conditions but falls off rapidly in the presence of interfering sounds. This is because spectral features, such as Mel-frequency cepstral coefficients (MFCCs), are sensitive to additive noise. MFCCs are a particular realization of warped-frequency representation with low-frequency focus. But there are several alternative, potentially mor...

متن کامل

Hierarchical subband linear predictive cepstral (HSLPC) features for HMM-based speech recognition

In this paper, a new approach for linear prediction (LP) analysis is explored, where predictor can be computed from a mel-warped subband-based autocorrelation functions obtained from the power spectrum. For spectral representation a set of multi-resolution cepstral features are proposed. The general idea is to divide up the full frequency-band into several subbands, perform the IDFT on the mel ...

متن کامل

Amplitude modulation features for emotion recognition from speech

The goal of speech emotion recognition (SER) is to identify the emotional or physical state of a human being from his or her voice. One of the most important things in a SER task is to extract and select relevant speech features with which most emotions could be recognized. In this paper, we present a smoothed nonlinear energy operator (SNEO)-based amplitude modulation cepstral coefficients (AM...

متن کامل

Evaluation of a speech recognition / generation method based on HMM and straight

We propose a method for integrating speech recognition and generation within a unified framework. The method consists of STRAIGHT, warped-frequency DCT, and an HMM engine. The warped-frequency DCT is used to derive a kind of mel-cepstral coefficient from the smoothed spectrum of STRAIGHT, which is known as a high-quality vocoder. This analysis/synthesis method has potential to improve the perfo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996